An XML-based Tool for Tracking English Inclusions in German Text

نویسندگان

Beatrice Alex

Claire Grover

چکیده

The use of lexicons and corpora advances both linguistic research and performances of current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English and German lexical databases and the World Wide Web to recognise English inclusions in German newspaper articles. The output of the tool can assist lexical resource developers in monitoring changing patterns of English inclusion usage. The corpus used for the classification covers three different domains. We report the classification results and illustrate their value to linguistic and NLP research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Prosodic Modifications for Polyglot Text-to-Speech Synthesis

This paper investigates the need for applying English prosody when synthesising English portions of mixed English/German texts using a German-based polyglot text-to-speech (TTS) synthesis system. The polyglot system is based on a monolingual German TTS system, which uses a phone mapping from English to German to synthesise English texts. Two systems with varying degrees of assimilation to Engli...

متن کامل

An Unsupervised System for Identifying English Inclusions in German Text

We present an unsupervised system that exploits linguistic knowledge resources, namely English and German lexical databases and the World Wide Web, to identify English inclusions in German text. We describe experiments with this system and the corpus which was developed for this task. We report the classification results of our system and compare them to the performance of a trained machine lea...

متن کامل

Automatic detection of English inclusions in mixed-lingual text with an application to parsing

The influence of English continues to grow to the extent that its expressions have begun to permeate the original forms of other languages. It has become more acceptable, and in some cases fashionable, for people to combine English phrases with their native tongue. This language mixing phenomenon typically occurs initially in conversation and subsequently in written form. In fact, there is evid...

متن کامل

Integrating Language Knowledge Resources to Extend the English Inclusion Classifier to a New Language

This paper presents an unsupervised system that classifies English inclusions in written text. It will demonstrate that extending this English inclusion classifier, which was originally designed for German, requires minimal time and effort to adapt to a new language, in this case French. The analysis of several evaluation experiments carried out on French and German data shows that the system p...

متن کامل

Bilingual Information Retrieval with HyREX and Internet Translation Services

HyREX is the Hypermedia Retrieval Engine for XML . Its extensibility is based on the implementation of physical data independence; its query interface on the conceptual level consists of data types with respective vague search predicates. This concept enabled us to add search predicates for the data type text for doing bilingual text retrieval. Our implementation uses free Internet resources fo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

An XML-based Tool for Tracking English Inclusions in German Text

نویسندگان

چکیده

منابع مشابه

Investigating Prosodic Modifications for Polyglot Text-to-Speech Synthesis

An Unsupervised System for Identifying English Inclusions in German Text

Automatic detection of English inclusions in mixed-lingual text with an application to parsing

Integrating Language Knowledge Resources to Extend the English Inclusion Classifier to a New Language

Bilingual Information Retrieval with HyREX and Internet Translation Services

عنوان ژورنال:

اشتراک گذاری